A Shiny on Corruption Data

Author

Robert W. Walker

Published

March 3, 2023

obertwwalker.github.io

Last updated: 2023-03-17 09:34:01

Timezone: America/Los_Angeles

Transparency Data

Transparency International provides a wealth of interesting data; I want to work with their Corruption Perceptions Index. The data can be obtained in an Excel spreadsheet. Here’s a brief shot of the file. The main object of interest, throughout, is the cpi_score – the corruption perception index.

Excel file

These data have a first two rows that will need to be skipped and the names are terrible but we can use janitor’s clean_names to take care of that. The other thing to notice is three sheets. The second sheet will need some tidying and the third sheet is not all that interesting, to me. Let’s import the first one.

How’s that done?
library(tidyverse)
library(readxl)
library(janitor)
library(DT)
CPI22 <- readxl::read_excel(path="data/CPI2022_GlobalResultsTrends.xlsx", sheet=1, skip=2) %>% clean_names()
datatable(CPI22)

We also have the time series data. Let’s first import them.

How’s that done?
CPI.Time <- readxl::read_excel(path="data/CPI2022_GlobalResultsTrends.xlsx", sheet=2, skip=2) %>% clean_names()
datatable(CPI.Time)

These data require some tidying with pivot_longer; we will want to grab a cpi_score, rank, sources, and standard_error for each year that we have data. It is worth noting that the ranks only go back to 2017. There are harder and easier ways to do this. I wrote a quick function to take two inputs and then pivot each of the four series separately.

How’s that done?
cleaner <- function(data, string) {
  # Start with the data
  data %>%
  # Use the iso3 as ID and keep everything that starts with string
    select(iso3, starts_with(string)) %>%
    # pivot those variables except iso3
  pivot_longer(.,
                    cols=-iso3,
               # names_prefix needs to remove string_
                    names_prefix = paste0(string,"_",sep=""),
               # make what's left of the names the year -- it will be a four digit year
                    names_to = "year",
              # make the values named string
                    values_to=string)
}
CPI.TS.Tidy <- cleaner(CPI.Time,"cpi_score")
Sources.TS.Tidy <- cleaner(CPI.Time,"sources")
StdErr.TS.Tidy <- cleaner(CPI.Time,"standard_error")
Rank.TS.Tidy <- cleaner(CPI.Time, "rank")

Now I can join them back together.

How’s that done?
Panel <- left_join(CPI.TS.Tidy, Sources.TS.Tidy) %>% left_join(., StdErr.TS.Tidy) %>% left_join(Rank.TS.Tidy) %>% mutate(year = as.integer(year))
rm(CPI.TS.Tidy, Sources.TS.Tidy, StdErr.TS.Tidy, Rank.TS.Tidy)

The third sheet is a set of statistically significant changes that I do not so much care about.

A Summary

How’s that done?
library(skimr)
Panel %>% skim()
Data summary
Name Piped data
Number of rows 1991
Number of columns 6
_______________________
Column type frequency:
character 1
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
iso3 0 1 3 3 0 181 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
year 0 1.00 2017.00 3.16 2012.00 2014.00 2017.00 2020.00 2022.00 ▇▅▅▅▅
cpi_score 42 0.98 43.03 19.27 8.00 29.00 38.00 56.00 92.00 ▃▇▅▂▂
sources 42 0.98 6.72 1.84 3.00 5.00 7.00 8.00 10.00 ▃▂▇▅▃
standard_error 42 0.98 2.89 1.55 0.41 1.85 2.51 3.49 12.81 ▇▅▁▁▁
rank 911 0.54 89.04 51.63 1.00 45.00 87.00 132.50 180.00 ▇▇▇▇▇

A Map

How’s that done?
library("rnaturalearth")
library("rnaturalearthdata")
world <- ne_countries(scale = "medium", returnclass = "sf")
# create world map using ggplot() function
ggplot(world) +
    geom_sf(fill="pink", color="black", size=0.1, alpha=0.2) +
    theme_void() +
  labs(title="A Starting Point")

Join the Map and the Data

The sf package has special merge methods that I will deploy to combine the two bits of data.

How’s that done?
Map.Data <- merge(world, Panel, by.x="iso_a3", by.y= "iso3")

Now to draw a map.

How’s that done?
# create world map using ggplot() function
Map.Data <- Map.Data %>% mutate(tooltip = paste0(sovereignt,"<br>",year,"<br>CPI: ",cpi_score, sep=""))
Map.Res <- Map.Data %>% 
  dplyr::filter(year==2022L) %>% 
  ggplot(.) +
    geom_sf(aes(fill=cpi_score, text=tooltip), size=0.1, alpha=0.8) +
  scale_fill_viridis_c() +
    theme_void() +
  labs(title="Perceived Corruption around the World in 2022",
       caption="Data from Transparency International",
       fill = "CPI") + theme(legend.position="bottom")
Map.Res

How’s that done?
library(plotly)
ggplotly(Map.Res, tooltip = "text")

This is enough to wrap as a page for a shiny app. The year selector and the two maps.

A Second Page

I want a second page that will only display the 2022 results in CPI22. There are tons of measures there, many with a lot of missing data, but those are only for one year. The selector here is going to be the series to plot.

References

How’s that done?
knitr::write_bib(names(sessionInfo()$otherPkgs), file="bibliography.bib")

References

Firke, Sam. 2023. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Massicotte, Philippe, and Andy South. 2023. Rnaturalearth: World Map Data from Natural Earth. https://docs.ropensci.org/rnaturalearth/ https://github.com/ropensci/rnaturalearth.
Müller, Kirill, and Hadley Wickham. 2022. Tibble: Simple Data Frames. https://CRAN.R-project.org/package=tibble.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with r, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2022. Plotly: Create Interactive Web Graphics via Plotly.js. https://CRAN.R-project.org/package=plotly.
South, Andy. 2017. Rnaturalearthdata: World Vector Map Data from Natural Earth Used in Rnaturalearth. https://github.com/ropenscilabs/rnaturalearthdata.
Waring, Elin, Michael Quinn, Amelia McNamara, Eduardo Arino de la Rubia, Hao Zhu, and Shannon Ellis. 2022. Skimr: Compact and Flexible Summaries of Data. https://CRAN.R-project.org/package=skimr.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2022a. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.
———. 2022b. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
———. 2023. Forcats: Tools for Working with Categorical Variables (Factors). https://CRAN.R-project.org/package=forcats.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2022. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2022. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Lionel Henry. 2023. Purrr: Functional Programming Tools. https://CRAN.R-project.org/package=purrr.
Wickham, Hadley, Jim Hester, and Jennifer Bryan. 2022. Readr: Read Rectangular Text Data. https://CRAN.R-project.org/package=readr.
Wickham, Hadley, Davis Vaughan, and Maximilian Girlich. 2023. Tidyr: Tidy Messy Data. https://CRAN.R-project.org/package=tidyr.
Xie, Yihui, Joe Cheng, and Xianying Tan. 2023. DT: A Wrapper of the JavaScript Library DataTables. https://github.com/rstudio/DT.